Find the number of cars per family

The example below shows how to count the number of passenger cars per family. It has some common features with the example Find the number of passenger cars per individual. But in this case, a two-step aggregation is performed: First, the number of passenger cars per individual is counted using the collapse(count) command when in a passenger car dataset, then the information is linked into a person dataset that, among other things, contains a variable with family identifiers. Then the data is aggregated to the family level using the collapse command, now using the family as the aggregation measure.

To find numbers for families that do not own a passenger car, it is important to use a person dataset (total population) as a basis, i.e., to link the number of passenger cars into a person dataset (and not the other way around), and then aggregate to the family level. Individuals for whom there is no link from the passenger car dataset will, by definition, be individuals who do not own a passenger car.

 require no.ssb.fdb:32 as db

// Creating person dataset with family number
create-dataset population
import db/BEFOLKNING_KOMMNR_FORMELL 2024-01-01 as municipality
import db/NUDB_BU 2023-08-01 as education
import db/INNTEKT_LONN 2022-12-31 as salary
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_date
import db/BEFOLKNING_REGSTAT_FAMNR 2023-01-01 as famnr

generate age = 2024 - int(birth_date/100)
generate education_level = substr(education,1,1)
destring education_level


// Creating car dataset
create-dataset cars
import db/KJORETOY_KJT_GRUP 2023-12-31 as vehicle_group
import db/KJORETOY_KJORETOYID_FNR 2023-12-31 as idnr
tabulate vehicle_group
keep if vehicle_group == '101'

// Aggregating to individual level and finding the number of cars per individual
collapse (count) vehicle_group -> num_cars, by(idnr)

// Linking information about the number of cars to the person dataset
merge num_cars into population


// Aggregating person dataset up to family level and creating statistics over the number of cars per family
use population
collapse(sum) num_cars salary -> sum_salary_family (max) education_level -> max_education_level age -> oldest_age, by(famnr)

recode max_education_level (0/3 = 1 'Not completed secondary')(4/5 = 2 'Secondary education')(6 = 3 'Lower academy education')(7 = 4 'Higher academy education (master level)')(8 = 5 'Research education')(9 = 6 'Not stated')

// Cloning variable for number of cars, and combining number over 2 into a combined category for one variant
clone-variables num_cars -> num_cars_raw
recode num_cars (missing = 0 'None')(3/max = 3 '>')
recode num_cars_raw (missing = 0 'None')

textblock
Total population (families) distributed by number of cars
endblock
tabulate num_cars
piechart num_cars

textblock
Number of cars per family distributed by highest education level in the family
endblock
tabulate max_education_level num_cars, rowpct missing
piechart num_cars if max_education_level == 1
piechart num_cars if max_education_level == 2
piechart num_cars if inrange(max_education_level,3,5)
barchart(mean) num_cars_raw, over(max_education_level)

textblock
Average total family income and age of the oldest person in the family distributed by number of cars
endblock
barchart(mean) sum_salary_family, over(num_cars)
barchart(mean) oldest_age, over(num_cars)


//Creating geographical divisions

//Retrieving the most recently updated place of residence at the individual level and linking it to the family dataset via the family number (which is equal to the identifier of the oldest person in the family) - the residence information will then point to the place of residence of the oldest person in the family. This will, in practice, also be the family's place of residence.
create-dataset residence
import db/BEFOLKNING_KOMMNR_FORMELL 2024-01-01 as municipality
merge municipality into population on famnr

use population
generate county = substr(municipality,1,2)

define-labels county_label '03' Oslo '11' Rogaland '15' 'Møre og Romsdal' '18' Nordland '31' Østfold '32' Akershus '33' Buskerud '34' Innlandet '39' Vestfold '40' Telemark '42' Agder '46' Vestland '50' Trøndelag '55' Troms '56' Finnmark
assign-labels county county_label

//Cloning municipality variable and extracting cities from one variable (the rest are placed in a combined category)
clone-variables municipality -> city
replace city = '9999' if !inlist(city,'0301','1103','3201','3301','4204','4601','5001','5501')
recode city ('9999' = '9999' 'Other municipalities')

textblock
Number of cars per family distributed by place of residence and top 10 municipalities ranked by the proportion of families that do not own a car
endblock
tabulate county num_cars, rowpct rowsort(0)
barchart(mean) num_cars_raw, over(county)

tabulate municipality num_cars, rowpct rowsort(0) bottom(10)
tabulate city num_cars, rowpct rowsort(0)
barchart(mean) num_cars_raw, over(city)